PSCI 2270 - Week 4
Department of Political Science, Vanderbilt University
September 19, 2024
Some math… LLN and CLT
Logic of causal inference
Probability:
Law of Large Numbers
Central Limit Theorem:
Suppose we collect \(n\) observations: \(X_1\) , \(X_2\), … , \(X_n\)
We then summarize these \(n\) observations by calculating a statistic, e.g. mean
How do we know how far is this away from summary of all \(N\) (or even infinite number of!) units in population?
Law of Large Numbers (LLN)
Let \(X_1\) , … , \(X_n\) be i.i.d. random variables with mean \(\mu\) and finite variance \(\sigma^2\). Then, \(\bar{X}_{n}\) converges to \(\mu\) as \(n\) gets large.
Central Limit Theorem (CLT)
Let \(X_1\) , … , \(X_n\) be a sample from population with mean \(\mu\) and variance \(\sigma^2\). Then, \(\bar{X}_n\) (sample mean) will be approximately distributed \(N ( \mu, \sigma^2 / n )\) as \(n\) goes to infinity.
Intuition: Imagine you can collect many (large) samples and for each calculate a mean, resulting means form sampling distribution that has good properties
Important result: We now know how far away \(\bar{X}_n\) can be from population mean of \(X_i\) (not \(\bar{X}_n\))!
Distribution possible value of \(X\) \(\rightarrow\) probability of \(X\) taking this value
The Normal distribution is one of the most ubiquitous distributions in statistics
Three key properties:
We usually only have one sample, so we’ll only get one sample mean. So why do we care about LLN/CLT?
\[ SE = \sqrt{\frac{\sigma^2}{n}} = \frac{\sigma}{\sqrt{n}} \]
Latest Gallup poll:
Our data: simple random sample of size \(n\) from some population \(X_1\) , … , \(X_n\)
Point estimation: providing a single “best guess” as to the value of some fixed, unknown quantity of interest, \(\theta\) (read theta)
Examples of quantities of interest ( estimands ):
Estimator
An estimator, \(\hat{\theta}\), of some parameter \(\theta\), is a statistic: \(\hat{\theta} = h(X_1 , ... , X_n )\).
An estimate is one particular realization of the estimator
There are many (\(\infty\)) different possible estimators:
How good are these different estimators?
We usually rely on mean, partly because it makes LLN and CLT apply
\[ \underbrace{\text{estimate}}_{\text{sample mean, }\bar{X}} = \underbrace{\text{estimand}}_{\text{population mean, }p} + \text{noise} \]
Remember: the sample mean is a random variable
Expectation: average of the estimates across repeated samples
\[\sqrt{\mathrm{Var}(\bar{X})} = \sqrt{\frac{p(1 − p)}{n}}\]
\[\sqrt{\widehat{\mathrm{Var}}(\bar{X})} = \sqrt{\frac{\bar{X}(1 − \bar{X})}{n}} \class{fragment}{= \sqrt{\frac{0.39 (1 − 0.39)}{1007}} \approx 0.0153}\]
\[ \bar{X} − p = \text{noise}\]
How can we figure out a range of plausible noise?
\[\bar{X} \sim N \left( \underbrace{\mathbb{E}[X_i]}_{p}, \underbrace{\frac{\mathrm{Var}(X_i)}{n}}_{\frac{p(1-p)}{n}} \right)\]
First, choose a confidence level.
\(100 \times (1 − \alpha)\) % confidence interval: \(CI = X ± z_{\alpha/2} \times SE\)
This is important if all relevant outcomes within sample are observed
But, for causal (“what-if”) question we cannot observe all relevant outcomes within sample \(\Rightarrow\) internal validity
Does the minimum wage increase the unemployment rate?
Does having a daughter affect a judge’s rulings in court?
Fundamental problem of causal inference
Question: Does having a female as a head of a village council increase share of budget allocated to water sanitation?
Setting: 8 randomly sampled villages in Indonesia (some with female and some with male head)
Outcome: Share of budget each village spends on water sanitation
| Village | Head of Council | Budget Share |
|---|---|---|
| Village 1 | Female | 15% |
| Village 2 | Male | 10% |
Treatment (\(T_i = 1\)) group: Villages with female head of council
Control (\(T_i = 0\)) group: Villages with male head of council
| Village | \(T_i\) (Head of Council) | \(Y_i\) (Budget Share) |
|---|---|---|
| Village 1 | 1 | 15 |
| Village 2 | 0 | 10 |
What does “\(T_i\) causes \(Y_i\)” mean?
Imagine two states of the world: one in which you receive some treatment and another in which you do not \(\Rightarrow\) potential outcomes
(Individual) Treatment effect: \(Y_i (1) − Y_i (0)\)
Average Treatment Effect (ATE):
\[ \frac{1}{n} \sum_{i = 1}^{n} Y_i (1) − \frac{1}{n} \sum_{i = 1}^{n} Y_i (0) = \frac{1}{n} \sum_{i = 1}^{n} \left[ Y_i (1) − Y_i (0) \right] \]
| Village | \(T_i\) (Head of Council) | \(Y_i\) (Budget Share) | \(Y_i (0)\) (Budget Share if Male Head) | \(Y_i (1)\) (Budget Share if Female Head) |
|---|---|---|---|---|
| Village 1 | 1 | 15 | ??? 11 16 14 10 | 15 |
| Village 2 | 0 | 10 | 10 | ??? 12 7 9 15 |
Fundamental problem of causal inference:
Observe \(Y_i = Y_i (1)\) if \(T_i = 1\) or \(Y_i = Y_i (0)\) if \(T_i = 0\)
Find a similar unit! \(\Rightarrow\) matching
Did village spend more on water sanitation because of female council head?
NJ increased the minimum wage. Causal effect on unemployment?
The problem: imperfect matches!
Say we match villages \(i\) (treated) and \(j\) (control)
Selection Bias: \(Y_i (1) \neq Y_j (1)\) or \(Y_i (0) \neq Y_j (0)\)
Those who take treatment may be different that those who take control
How can we correct for that?
RANDOMIZE! 😵💫